PIPELINING OF MULTIPLEXER LOOPS IN A DIGITAL CIRCUIT 



Inventor: Keshab K. Parhi 

FIELD OF THE INVENTION 

[0001] The present invention relates to digital circuits. More particularly, the 

present invention relates to pipelining of multiplexer loops in a digital circuit. 

BACKGROUND OF THE INVENTION 

[0002] Communicating information via the internet and other digital 

communications systems has become common in the United States and 
elsewhere. As the number of people using these communications systems has 
increased so has the need for transmitting digital data at ever increasing rates. 

[0003] As will be understood by persons skilled in the relevant arts, digital 

communications systems are designed, for example, using look-ahead, pipelining, 
and parallelism techniques. These known techniques have enabled engineers to 
build digital communications systems, using available manufacturing 
technologies, which operate at data rates in excess of 1 Gb/s. These known 
techniques, however, cannot always be applied successfully to the design of 
higher speed digital communications systems. Applying these techniques is 
particularly difficult when dealing with nested feedback loops or multiplexer 
loops. 

[0004] The use of look-ahead, for example, for fast computation of recursive 

loops is known. However, there are several approaches that can be used in 
applying look-ahead in the context of a multiplexer loop such as, for example, the 
multiplexer loop of a decision feedback equalizer found in modern transceivers. 
Many of these approaches will not improve the performance of the digital circuit 
to which they are applied, and some of these approaches can even degrade circuit 
performance. In similar fashion, the application of known pipelining and 
parallelism techniques to nested feedback loops or multiplexer loops in high 
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speed digital communications systems will not necessarily result in improved 
performance. 

[0005] There is a current need for new design techniques and digital logic circuits 

that can be used to build high-speed digital communication systems. In 
particular, design techniques and digital logic circuits are needed which can be 
used to build digital communications circuits that operate in excess of 2.5 Gb/s. 

BRIEF SUMMARY OF THE INVENTION 

Jj [0006] A digital logic circuit and method for determining an output value based 

'! on a plurality of inputs values is provided. As described herein, the present 

V* invention can be used in a wide range of applications. The invention is 

D 

s ' particularly suited to high-speed digital communications systems, although the 

rf invention is not limited to just these systems. 

ill [0007] In an embodiment of the invention, an rc-level look-ahead network 

CP 

rj converts a plurality of input values to a plurality of intermediate values. These 

- y intermediate values are provided to a plurality of multiplexers. Each multiplexer 

has at least a first and a second input port, an output port, and a control port. The 
plurality of multiplexers are arranged to form apipelined multiplexer loop having 
at least a first and a second stage. The multiplexers of the pipelined multiplexer 
loop are electrically coupled to the w-level look-ahead network. 
[0008] In an embodiment, the first stage of the pipelined multiplexer loop 

consists of a single 2-to-l multiplexer. The second stage consists of at least two 
2-to- 1 multiplexers. Communication links electrically couple the output ports of 
the second stage multiplexers to the input ports of the first stage multiplexer. A 
first feedback loop electrically couples the output port of the first stage 
multiplexer to the control port of the first stage multiplexer. This first feedback 
loop has a first delay device having a first delay time. A second feedback loop 
couples the output port of the first stage multiplexer to the control ports of the 
second stage multiplexers. This second feedback loop includes the first delay 
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device and a second delay device having a second delay time. The first delay 
time is an integer multiple of the second delay time and is equal to (n+1) times 
a clock period of operation of the digital logic circuit. 

[0009] In an embodiment, the rc-level look-ahead network is a 2-level look-ahead 

network. The 2-level look-ahead network is formed using a plurality of 2-to-l 
multiplexers. In this embodiment, the first delay time is nominally three times 
the second delay time. 

[0010] In an embodiment, the digital logic circuit of the invention forms part of 

a transceiver circuit. For example, the digital logic circuit of the invention can 
be used to form a decision feedback equalizer. The invention can be used, for 
example, in backplane, optical/fiber, twisted-pair, and coaxial cable transceivers. 

[0011] It is a feature of the invention that it can be used to form part of a 

communications system operating at a data rate of at least 3 gigabits per second. 

[0012] Further features and advantages of the present invention, as well as the 

structure and operation of various embodiments of the present invention, are 
described in detail below with reference to the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES 

[0013] The present invention is described with reference to the accompanying 

figures. In the figures, like reference numbers indicate identical or functionally 
similar elements. Additionally, the left-most digit or digits of a reference number 
identify the figure in which the reference number first appears. The 
accompanying figures, which are incorporated herein and form part of the 
specification, illustrate the present invention and, together with the description, 
further serve to explain the principles of the invention and to enable a person 
skilled in the relevant art to make and use the invention. 

[0014] FIG. 1 illustrates an example 2-tap decision feedback equalizer circuit. 

[0015] FIG. 2 illustrates an example circuit of a reformulated version of the 

decision feedback equalizer circuit of FIG. 1, where all four possible inputs are 
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precomputed, and where an output is selected using a 4-to- 1 multiplexer with two 
previous outputs acting as control signals. 
[0016] FIG. 3 illustrates how a 4-parallel embodiment decision feedback 

equalizer circuit is used in a backplane, fiber, or cable transceiver operating at 
high speed. 

[0017] FIG. 4 illustrates 64-to-l multiplexer loop. 

[0018] FIG. 5 illustrates a circuit having a single feedback loop. 

[0019] FIG. 6 illustrates a circuit having three feedback loops. 

[0020] FIG. 7 illustrates a 4-unfolded multiplexer loop circuit obtained by 

unfolding the multiplexer loop of FIG. 4 by a factor of four. 
[0021 ] FIG. 8 illustrates two cut-sets that can be used to retime the circuit of FIG. 

7. 

[0022] FIG. 9 illustrates the retimed 4-unfolded multiplexer loop of FIG. 7. 

[0023] FIG. 10 illustrates a 2-to-l multiplexer loop circuit. 

[0024] FIG. 1 1 illustrates a circuit that can be formed by applying pipelining and 

look-ahead to the circuit of FIG. 10. 
[0025] FIG. 12 illustrates a 4-to-l multiplexer loop circuit. 

[0026] FIG. 13 illustrates a circuit developed by applying a first form of 

pipelining and look-ahead to the circuit of FIG. 12. 
[0027] FIG. 14 illustrates a circuit developed by applying a second form of 

pipelining and look-ahead to the circuit of FIG. 12. 
[0028] FIG. 15A illustrates a circuit according to an embodiment of the 

invention. 

[0029] FIG. 15B illustrates a circuit having a 3 -level look-ahead network 

according to an embodiment of the invention. 
[0030] FIG. 1 6 illustrates a 64-to- 1 multiplexer loop that incorporates the circuit 

of FIG. 15 A. 

[0031] FIG. 17 illustrates a critical path of a 4-unfolded multiplexer loop based 

on the circuit of FIG. 16. 
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[0032] FIG. 18 illustrates two cut-sets that can be used to retime the circuit of 

FIG. 17. 

[0033] FIG. 1 9 illustrates the retimed circuit of FIG. 17. 

[0034] FIG. 20 illustrates an 8-tol multiplexer loop. 

[0035] FIG. 21 illustrates a circuit formed by applying the look-ahead and 

pipelining techniques of the invention to the 8-tol multiplexer loop of FIG. 20. 
[0036] FIG. 22 illustrates a flowchart of the steps of a method for pipelining 

multiplexer loops that form part of an integrated circuit according to an 
embodiment of the invention. 
[0037] FIG. 23 illustrates a serial representation of a 3-tap decision feedback 

ill equalizer. 

If! 

jj [0038] FIG. 24 illustrates a serial representation of a 3-tap decision feedback 

: 'Z equalizer having 2-levels of look-ahead according to the invention. 

I [0039] FIG. 25 illustrates a 2-level ' look-ahead network according to an 

u 

Mt embodiment of the invention. 

m [0040] FIG. 26 illustrates a 4-unfolded comparator circuit with f,-latch and 

jrj pipeline-registers. 

[0041] FIG. 27 illustrates a 6-bit compare circuit. 

[0042] FIG. 28 illustrates a serializer/deserializer 4-tap decision feedback 

equalizer integrated circuit according to an embodiment of the invention. 



DETAILED DESCRIPTION OF THE INVENTION 



[0043] Modern digital communications systems contain circuits having feedback 

loops. These circuits are used to perform a variety of functions. For example, 
FIG. 1 illustrates a circuit 100 having two feedback loops. Circuit 100 is a 2-tap 
decision feedback equalizer (DFE). 

[0044] Circuit 100 has two delay devices 1 02, 104 and a threshold device 1 06. 

In an embodiment, delay devices 102, 104 are flip-flops. In other embodiments, 
other devices such as registers are used. As will be understood by persons skilled 
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in the relevant arts, the output of these devices change in accordance with a clock 
signal. Thus, the performance or rate at which circuit 100 can process data is 
limited by a clock period of operation. For circuit 100, the clock period of 
operation is limited by one multiply, two adds, and a thresholding (or compare) 
operation. However, for binary signaling, i.e., where a(n) is "0" or "1"; 
multiplication by "0" or "1 " is typically not a factor. 
[0045] The rate at which data is processed in a digital communications system 

can be increased through the use of parallelism or unfolding. For example, fast 
DFE implementations typically reformulate the DFE loop computation based on 

O parallel branch delayed decision techniques where all possible outputs are 

P 

!__ - computed and the correct output is selected by a multiplexer. The multiplexer is 
typically controlled by one or more previous outputs. In such implementations, 

Nj ; the feedback loop is limited to a multiplexer delay only. The maximum operating 

\f performances or speed of a 2-to-l multiplexer built using 0.13 micron 

rf photolithography technology is about 0.2 nanoseconds (ns). 

ty [0046] FIG. 2 illustrates a circuit 200 formed by reformulating circuit 1 00 using 

w 

q a parallel branch delayed decision technique. Circuit 200 can process data at a 

m higher rate than circuit 100. As illustrated in FIG. 2, circuit 200 has two delay 

devices 202, 204, four threshold devices 206, 208, 210, 212, and a 4-to-l 
multiplexer 214. The inputs to the four threshold devices 206, 208, 210, 212 
must be computed. As would be known to persons skilled in the relevant arts, the 
performance of circuit 200 is inherently limited by the operating performance of 
multiplexer 214. In general, an X-tap DFE can be reformulated and implemented 
using 2 X comparators and a 2 x -to-l multiplexer. The speed is limited by the 2 X - 
to-1 multiplexer. It should be noted that if the signal a(n) has "Y" possible values 
or levels, it can be represented using a word-length of "b" bits, where "b" equals 
|log2(Y)| (i.e., log of Y with respect to base 2) and the function |r| represents the 
ceiling function which represents the smallest integer greater than or equal to "r." 
Such signals are often referred to as PAM-Y modulated signals (e.g., PAM-4 or 
PAM-5 modulated signals), which represent a signal with " Y" levels represented 
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by pulse amplitude modulation. For a system using such signals, an X-tap DFE 
can be reformulated and implemented using Y x comparators and a Y x -to-b 
multiplexer. 

[0047] FIG. 3 illustrates a circuit 300 that implements a serdes 

(serializer/deserializer) for a backplane application, which makes use of a 4- 
parallel embodiment of a DFE. As shown in FIG. 3, circuit 300 has a DFE 302, 
a precursor filter 304, four analog-to-digital converters (ADC) 306a, 306b, 306c, 
306d, four programmable gain amplifiers (PGA) 308a, 308b, 308c, 308d, atiming 
ijL recovery circuit 3 1 0, and an automatic gain control circuit 312. 

2 [0048] A 6-tap DFE can be implemented using 64 comparators and a 64-to-l 

i • multiplexer loop in a serial implementation. A 64-to-l multiplexer loop 400 is 

illustrated in FIG. 4. 64-to-l multiplexer loop 400 is implemented using sixty- 
Hl three 2-to-l multiplexers 402. 64-to-l multiplexer loop 400 requires 32 instances 

■ of 2-to- 1 multiplexer 402a, 1 6 instances of 2-to- 1 multiplexer 402b, 8 instances 

5 of 2-to-l multiplexer 402c, 4 instances of 2-to-l multiplexer 402d, 2 instances of 

2-to-l multiplexer 402e, and 1 instance of 2-to-l multiplexer 402f. 

O [0049] As will be understood by persons skilled in the relevant arts, 2-to-l 

m 

multiplexer 402f is highly loaded. Fan-out and a large capacitance typically 
reduce the expected performance of 64-to-l multiplexer loop 400. For example, 
a typical computation time for multiplexer 402f, loaded as shown in FIG. 4, is 
about 0.45 ns (i.e., more than twice the 0.2 ns that can be achieved if multiplexer 
402f were not heavily loaded). 

[0050] As illustrated in FIG. 4, 64-to-l multiplexer loop 400 has six delay 

devices 404a, 404b, 404c, 404d, 404e, 404f. These six delay devices form part 
of six nested feedback loops. As described herein, nested feedback loops limit 
the applicability of known design techniques used by engineers to build high- 
speed digital communications systems. 

[0051] In order to understand how nested feedback loops limit the applicability 

of known design techniques, and how the present invention overcomes the 
limitations of the known design techniques, it is useful to consider an example 
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design in which both known design techniques, and the techniques of the present 
invention, are applied to 64-to-l multiplexer loop 400. First, however, the 
following terms are defined so that they may be used in the description that 
follows: loop; loop bound; critical loop; and iteration bound. 

10052] As used herein, "loop" means a directed path that begins and ends at the 

same node of a circuit. 

[0053J As used herein, "loop bound" means a calculated time, wherein the loop 

bound of the j-th loop of a circuit is given by EQ. 1 : 

C! 

T 

;7 EQ. 1 

m 

O where Tj is the loop computation time and Wj is the number of delays in the loop, 

p This point is further illustrated by the circuit in FIG. 5 . 

[0054] FIG. 5 illustrates a circuit 500 having a single loop (i.e., a feedback loop). 

SI This single loop contains two delays (shown in FIG. 5 as a single delay device 

502 such as, for example, a 2-bit shift register or 2 latches in series). Circuit 500 
has an adder 504 and a multiplier 506. The output of circuit 500, y(n), is given 
by EQ. 2: 

y{n) = a* y{n - 2) + x{n) EQ. 2 

Assuming that the combined computation time of adder 504 and multiplier 506 
is 10 ns, the loop bound of the feedback loop of circuit 500 is 5 ns (i.e., 10 ns/2 
delays = 5 ns). 

[0055] As used herein, "critical loop" means the loop of a circuit having the 

longest loop bound. A circuit may have more than just one critical loop. 
[0056] As used herein, "iteration bound" means the loop bound of the critical 

loop of a circuit. This point is further illustrated by FIG. 6. 



m 
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[0057] FIG. 6 illustrates a circuit 600 having three loops 602, 604, 606. Loop 

602 starts at node A, goes to node B, and returns to node A. Loop 602 contains 
a single delay 603. Loop 604 starts at node A, goes to node B, goes to node C, 
and returns to node A. Loop 604 contains two delays 605. Loop 606 starts at 
node B, goes to node C, goes to node D, and returns to node B. Loop 606 also 
contains two delays 607. As shown in FIG. 6, the computation times of node A 
is 10 ns. The computation time of node B is 2 ns. The computation time of node 
C is 3 ns. The computation time of node B is 5 ns. 
C [0058] In accordance with EQ. 1, the loop bound of loop 602 is 12 ns ((10 ns + 

O 2 ns)/l delay = 12 ns). The loop bound of loop 604 is 7.5 ns ((1 0 ns + 2 ns + 3 

ill 

|fl ns)/2 delay = 7.5 ns). The loop bound of loop 606 is 5 ns ((2 ns + 3 ns + 5 ns)/2 

,.n 

jpf delay = 5 ns). Thus, the iteration bound of circuit 600 is 1 2 ns (i.e., the maximum 

O of 12 ns, 7.5 ns, and 5 ns). 

O [0059] As can be see in FIG. 4, every feedback loop of 64-to- 1 multiplexer loop 

n j 400 is a critical loop. The iteration bound of 64-to- 1 multiplexer loop 400 is the 

computation time of a single 2-to-l multiplexer 402. 
M [0060] A design example will now be described in order to illustrate the present 

invention and how the present invention overcomes the deficiencies of techniques 
and digital logic circuits known in the relevant arts. 
[0061] As described above, fan-out and a large capacitance typically degrade the 

expected performance of 64-to- 1 multiplexer loop 400. This problem is 
compounded when unfolding or parallelism techniques are applied in order to 
design a high-speed digital communications system. To illustrate this point, 
consider the following example in which known design techniques are applied to 
64-to- 1 multiplexer loop 400 in order to build a high-speed digital 
communications system. 
[0062] The example starts by assuming that a maximum clocking rate of 500 

MHz can be achieved, using an available manufacturing technology. Given a 
maximum achievable clocking rate of 500 MHz, the clocking period of the 
example circuit will be 2 ns. It will be assumed for purposes of the example that 
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an iteration bound of less than 1.7 ns must be achieved in order to provide 
sufficient operating margin or the circuit design will be unacceptable. 
[0063] The example involves designing a 4-parallel implementation of a 6-tap 

DFE. FIG. 7 illustrates a 4-unfolded multiplexer loop circuit 700, which is 
obtained from the 64-to-l multiplexer loop circuit 400 shown in FIG. 4. Circuit 
700 contains several 2-to-l multiplexers 402 and several delays 404. The critical 
path of circuit 700 is illustrated by a dashed line 702. As can be seen is FIG. 7, 
y. the critical path involves nine 2-to-l multiplexers 402. The expected 

T computation time of nine 2-to-l multiplexers is 1 .8 ns (i.e., 9 x 0.2 ns = 1 .8 ns). 

l i 1 Thus, circuit 700 does not satisfy the design criterion of having an iteration bound 

in 

ifl of less than 1 .7 ns. As would be known to persons skilled in the relevant arts, 

£1 retiming may be used to reduce the number of 2-to-l multiplexers 402 in the 

s _ critical path. 

P 

U [0064] FIG. 8 illustrates two cut-sets 802, 804 that can be used to reduce the 

t number of 2-to-l multiplexers 402 in the critical path of circuit 700. 

5 [0065] FIG. 9 illustrates the retimed 4-unfolded loop of FIG. 8. As can be seen 

in FIG. 9, the critical path (shown by a dashed line 902) now involves just four 
2-to-l multiplexers 402. This is misleading, however, because as described 
above, the four multiplexers (F 0 , F l5 F 2 , and F 3 ) in the critical path are heavily 
loaded. Rather than having an expected iteration bound of 0.8 ns (i.e., 4 x 0.2 ns 
= 0.8 ns), the actual iteration bound is 1 .8 ns (i.e., 4 x o.45 ns = 1 .8 ns). Thus, as 
illustrated by FIG. 9, the known techniques of unfolding and retiming cannot be 
applied to the nested loops of 64-to-l multiplexer loop 400. Applying these 
known techniques has led to an unacceptable circuit design. 
[0066] Another known technique that can be used to improve the iteration bound 

of a circuit is pipelining combined with look-ahead. This technique is illustrated 
byFIGs. 10 and 11. 

[0067] FIG. 10 illustrates a 2-to-l multiplexer loop circuit 1000. Circuit 1000 

has a 2-to-l multiplexer 1002 and a delay 1004. Assuming multiplexer 1002 has 
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a computation time of 0.2 ns, the iteration bound of circuit 1 000 is 0.2 ns (i.e., 0.2 
ns/1 delay = 0.2 ns). The output, a„, of circuit 1000 is given by EQ. 3: 

a n = A n * a n _ x + B n * a n _ x EQ. 3 

[0068] FIG. 11 illustrates a circuit 1100 that can be formed by applying 

pipelining and look-ahead to circuit 1000. Circuit 1100 has a delay 1102, a 

M= multiplexer 1 1 04, and a multiplexer 1 1 06 in addition to the multiplexer 1 002 of 

Q 

O circuit 1 000. The output of circuit 1 1 00 is given by EQ. 5 , which is obtained by 

HI 
m 



m 
-■ 

p 

m 



substituting previous iterations of EQ. 3 and EQ. 4 in EQ. 3. 



K _ 

O a = A„ a„, + B„ a, EQ. 4 



a n = A n \_A n _ x a n _ 2 + B n _ x a n _^\ 
+ B^A n _ x a n _ 2 + B n _ x a n _ 2 ] 
= [A n A n _ 1 + A n _ l B n ]a n _ 2 
+ [A n B n _ 1 + B n B n _ x \a n _ 2 



EQ.5 



[0069] Assuming the computation time of each of the multiplexers of circuit 

1 100 is 0.2 ns, the iteration bound of circuit 1100 is 0.1 ns (i.e., 0.2 ns/2 delays 
= 0.1 ns). Thus, the known method for applying pipelining and look-ahead to 
circuit 1000 has improved the iteration bound by a factor of 2. 

[0070] This is not the case, however, when the method is applied to a circuit 

having nested feedback loops such as 64-to-l multiplexer loop 400, as illustrated 
by FIGs. 12-14. As will become apparent to persons skilled in the relevant arts 
given the description herein, there are several approaches that can be used in 
applying pipelining and look-ahead in the context of a multiplexer loop. The 
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know relevant art does not teach or suggest which form of pipelining and look- 
ahead, if any, will improve the performance of a circuit having nested feedback 
loops. 

[0071] As described herein, the present invention fills this void. 

[0072] FIG. 12 illustrates a 4-to-l multiplexer loop circuit 1200. Circuit 1200 

can also be thought of as forming the first two stages of any multiplexer loop that 
is 4-to-l or larger. The first stage consists of multiplexer 1202a. The second 
stage consists of multiplexers 1202b, 1202c. The output of circuit 1200 is given 
H 1 by the following equations: 

5 
If! 

m 
m 

| \E n = A n a n _ 2 + B^ n a n _ 2 

| \F n = C n a n _ 2 + D n a n _ 2 

[F n = C n a n _ 2 + D n a n _ 2 

m a n = En a n-\ + ^jPn-l 

a n = E n a n _ x + F n a n _ x 



EQ. 6a 

EQ. 6b 
EQ. 6c 



[0073] As shown in FIG. 12, circuit 1200 has three multiplexers 1202a, 1202b, 

1202c, and two delay devices 1204a,1204b. Assuming multiplexer 1202a has a 
computation time of 0.4 ns (i.e., it is highly loaded) and the other two 
multiplexers 1202b, 1202c, each have a computation time of 0.2 ns, the iteration 
bound of circuit 1200 is 0.4 ns. 
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m 



[0074] FIG. 13 illustrates a circuit 1300 developed by applying one form of 

pipelining and look-ahead to circuit 1200. As described herein, this form does 
not improve the performance of circuit 1200. It is shown only so that it can be 
contrasted with the present invention. 

[0075] As shown in FIG. 13, circuit 1 300 is formed by adding three delays 1 302, 

1304a, 1304b, and two multiplexers 1306a, 1306b to circuit 1200. The output of 
circuit 1300 is given by EQs. 7a and 7b. EQ. 7a is obtained by substituting past 
iterations of EQ. 6c in itself. 



<*n = E n[ E n-i a n-2 + K-A-l] E Q" 7a 

+ F n [K-^n-2+K-A-2] 

a n = [E n E n _ x + F n E n -iWi EQ- 7b 

+ [E n F n _ x + F n F n _?[a n _ 2 



[0076] Assuming the computation time of each multiplexer 1 306a, 1 306b is 0.2 

ns, the loop bound of the inner nested loop is 0.2 ns. But, the loop bound of the 
outer loop is 0.4 ns. Thus, as stated above, this application is not useful for 
improving the performance of a multiplexer loop. 

[0077] FIG. 1 4 illustrates a circuit 1 400 developed by applying a second form of 

pipelining and look-ahead to circuit 1200. This form also is not very useful for 
improving the performance of circuit 1200. This form is also shown so that it can 
be contrasted with the present invention. The output of circuit 1400 is given by 
by EQs. 8a and 8b. EQ. 8a is obtained by substituting past iterations of EQ. 6a 
and EQ. 6c in EQ. 6a. 
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«„ = (4^2 + 5 A-2 )(^-l««-2 + ^-1^-2 ) EQ- 8a 
f 4A-1 ) a n-2 + ( B nFn-l ) <>n-2 

a n~ {+C„E n J \ + D n F n J EQ. 8b 



[0078] Assuming the computation time of each of multiplexers 1 406a, 1 406b is 

0.2 ns, the loop bound of the inner nested loop is 0.2 ns. The loop bound of the 

Q outer loop is 0.3 ns. While this is an improvement over the form illustrated by 

Lfl 

m FIG. 13, it still does not resolve the decreased performance of the multiplexer 

loop. In the 4-unfolded parallel design of FIG. 9, applying this form of pipelining 
CI and look-ahead results in an expected iteration bound of about 1 .2 ns, which is 

g less than the 1 .7 ns criterion. But, for reasons described herein, this iteration 

Lg 

= = bound may not be achievable. Furthermore, as described below, the iteration 

iu 

00 bound can be reduced even further than this by applying the pipelining and look- 

fk 

ahead techniques of the invention. In comparison, the invention significantly 
increases the clock speed or symbol speed that can be achieved. 

[0079] In contrast to the pipelining and look-ahead forms of FIGs. 1 3 and 1 4, the 

pipelining and look-ahead of FIG. 15A solves the issue of degraded multiplexer 
loop performance described above. This is because the loop bound of every 
feedback loop of the multiplexer loop is improved rather than just improving the 
performance of one loop to the detriment of another loop. 

[0080] FIG. 15A illustrates a circuit 1500, according to an embodiment of the 

invention, that is formed by adding a delay 1502 and four 2-to-l multiplexers 
1504a, 1504b, 1504c, 1504d to circuit 1200. As shown in FIG. 15A, each of the 
2-to-l multiplexers has two input ports, one control port, and one output port. 
Note that none of the 2-to-l multiplexers 1 504a, 1 504b, 1 504c, 1 504d is included 
in a feedback loop. These multiplexers form part of a 1 -level look-ahead network 
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1 506. The extra delay added to circuit 1 200 forms a part of the innermost nested 
loop. 

[0081] The output of circuit 1 500 is given by the following equations: 

a„ = {J n a n _ 2 + B n a n _ 2 )a n _ x + {C n a n _ 2 + D n a n _ 2 )a n _ x 



U 
III 

u? 



m 
m 
i 
n 



+{C n _ l a n _ 3 + D n _,a n _ 3 )a„_ 2 

(4,-1^-3 + 5-1^-3 H-2 

+(Q j _ 1 <V 3 + A,-i^- 3 K-: 



+£ n (c„_ 1 a„_ 3 + £U^_ 3 H- 2 

+ Q(4-l^-3 + ^-l^-3K-2 

+£>„(C fM ^_ 3 + D n _ x a n _ 3 )a n _ 2 



+ [{B n C^ + AO-iK^s + K^-i + AA.1K-3K2 

[0082] The iteration bound of circuit 1 500 is 0.2 ns. As will become apparent to 

persons skilled in the relevant arts given the description herein, the pipelining and 
look-ahead of the invention increases the performance of the nested loop without 
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degrading the performance of the outer loop. In fact, as can be seen in FIG. 1 5 A, 
the invention can be used to restore the performance of the multiplexer loop to 
an expected level of performance (e.g., 0.2 ns). 

[0083] FIG. 1 5B illustrates a circuit 1 550 having a 3-level look-ahead network 

1556 according to an embodiment of the invention. Circuit 1550 is formed by 
adding a delay 1552 and 3-level look-ahead network 1556 to circuit 1200. As 
shown in FIG. 15B, each of the 2-to-l multiplexers has two input ports, one 
control port, and one output port. In accordance with the invention, the extra 
delay added to circuit 1200 forms a part of the innermost nested loop. 

[0084] As shown in FIG. 1 5B, 3-level look-ahead network 1 556 is formed using 

multiplexers and delays. 3-level look-ahead network 1556 transforms the four 
input values A„, B n , C n , and D n into four intermediate values O,, 0 2 , 0 3 , and 0 4 . 
As will be understood by persons skilled in the relevant arts, other circuits can be 
used to implement a 3-level look-ahead network. 

[0085] As described in more detail below, the invention can be implemented in 

a manner that will achieve an objective not obtainable by circuits 1 300 and 1 400. 
As described below, the invention can be implemented in a multiplexer loop such 
that the performance degradation caused by the heavy loading of multiplexer 
1202a is completely eliminated without increasing the loop bound of any loop. 
This is achieved by adding delay to the innermost nested feedback loop and by 
not adding any multiplexers within a loop of the multiplexer loop. As stated 
herein, a benefit of adding delay to the innermost feedback loop is that it 
improves the loop bound of every loop forming a part of the multiplexer loop. 

[0086] Returning to the example design application, FIG. 1 6 illustrates a 64-to- 1 

multiplexer loop circuit 1 600 that incorporates the embodiment of invention 
shown in FIG. 15 A. As can be seen in FIG. 16, circuit 1600 is formed from 
multiplexer loop 400 and circuit 1500. The loop bound of the loop containing 
multiplexers 402d,1504a, 1202b, 1202a and delays 1502, 1204b, 404d is 0.25 ns 
(i.e., (0.2 ns + 0.2 ns + 0.2 ns + 0.4 ns)/4 delays = 0.25 ns). As will become 
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apparent to persons skilled in the relevant arts given the description herein, the 
iteration bound of circuit 1600 is 0.25 ns. 
[0087] If look-ahead network 1 506 is moved to a position between multiplexers 

402c and 402d, the iteration bound of circuit is not changed (i.e, it remains at 0.25 
ns). The number of multiplexers included in look-ahead network 1 506, however, 
must be increased from 4 to 8. 
[0088] In similar fashion, moving look-ahead network 1 5 06 to a position between 

multiplexers 402b and 402c or to a position between multiplexers 402a and 402b 
also will not change the iteration bound of circuit 1600. The number of 
multiplexers included in look-ahead network 1506, however, will have be 
increased from 4 to 16, or 4 to 32, respectfully. 
[0089] If look-ahead network 1506 is moved to a location before multiplexer 

402a, the iteration bound of circuit 1600 is reduced. It is reduced to 0.2 ns, and 
every loop of circuit 1600 becomes a critical loop. This design requires 
increasing the number of multiplexers of look-ahead network 1 506 from 4 to 64. 
Thus, as can be seen from FIG. 16, it is advantageous to position look-ahead 
network 1506 in front of multiplexer 402a. 
[0090] FIG. 1 7 illustrates a 4-unfolded circuit 1 700 having a critical path 1 702. 

Critical path 1702 is illustrated by a dashed line. As shown in FIG. 17, circuit 
1700 is formed using circuit 1600. Circuit 1700 contains several 2-to-l 
multiplexers 402 and several delays 404. 
[0091 ] Critical path 1 702 has eight 2-to-l multiplexers. As described herein, the 

computation time of these 2-to-l multiplexers is 2.0 ns (i.e., 6 x 0.2 ns + 2 x 0.4 
ns = 2.0 ns). This computation time, however, does not meet the design 
requirement of 1 .7 ns. Thus, in accordance with the invention, retiming is used 
to reduce computation time of the circuit. 
[0092] FIG. 1 8 illustrates two cut-sets 1 802, 1 804 that can be used to reduce the 

number of 2-to-l multiplexers in the critical path of the circuit of FIG. 17. This 
will reduce the computation time of the circuit. 
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[0093] FIG. 19 illustrates the retimed circuit of FIG. 16. This retimed circuit has 

two critical paths 1902, 1 904. The computation time of critical path 1902 is 1 .0 
ns (i.e., 2 x 0.4 ns + 1 x 0.2 ns = 1 .0 ns). The computation time of critical path 
1904 is also 1.0 ns (i.e., 5 x 0.2 ns = 1.0 ns). This is well below the required 
design criterion of 1 .7 ns, and better than that which can be achieved when the 
invention is not used. By computing outputs and inverted outputs for the last 
stage of multiplexers, a clock period of operation of 1 .2 ns can be achieved (i.e., 
1.0 ns + 0.2 ns). 

[0094] As will become apparent to persons skilled in the relevant arts from the 

~ description that follows, the invention is not limited to a particular amount of 

?; look-ahead or a particular number of inputs-to-outputs, such as the 4-to-l ratio 

W illustrated in FIG. 1 5A. 

f [0095] FIG. 20 illustrates an 8-to-l multiplexer loop 2000. Multiplexer loop 

U 2000 is formed from a plurality of 2-to-l multiplexers 2002 and a plurality of 

P delays 2004, as shown in FIG. 20. The output of multiplexer loop 2000 is given 

flf by the following equations: 

1 
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a{n) = A" (n) a(n - l) + 5" (n) a(n - l) 
a (n) = A" (n) a(n - 1) + B" (n) a(n - 1) 
A" («) = A } («) o(w -2)+B' (n) a(n - 2) 
A" (ii) = Z' (/i) ain - 2) + 5 ' («) a (ti - 2) 
5" (») = C (77) a(n -2)+ U (n) a(n - 2) 
B" (n) = C ' {n) a(n -2)+D' (n) a{n - 2) 
A'(n) = A n a(n-3)+ B n a(n-3) 
A r (n) = A n a(n- 3)+B n a{n- 3) 
K B\n) = C n a{n- 3) + D n a(n- 3) 

k B\n) = C n a(n - 3) + D n a{n - 3) 

J C'(«)= a(n- 3)+F n a{n- 3) 

t C' (77) - - 3) + F w a (« - 3) 

r £>'(» = G„ 3)+ H n a(n- 3) 

" = G n a{n- 3)+H n a{n- 3) 

m 
0 
a 

m 



[0096] FIG. 2 1 illustrates a circuit 2100 formed by applying the look-ahead and 

pipelining of the invention to circuit 2000. As shown in FIG. 2 1 , delay 2004a has 
been replaced by a delay 2102, and a look-ahead network 2104 has been added. 
The benefits of circuit 2100 over circuit 2000 are the same as those already 
described herein for other circuits according to the invention. 

[0097] The output of circuit 2100 is described by the following equations: 
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fl(n)= a(n- l)[a(n- 2){a(n- 3)A n + a(n- 3)B n }+a(n- 2){a(n- 3)C„ + a(n-3)D„}] 
+ a(n- \)[(a(n- 2){a(n- 3)E n + a{n- 3)F„} + a(n- 2){a(n- 3)G„ + a(n- 3)H n }] 

a{n) = a(n- \)[a(n - 2){a(n - 3)4, + a(n- 3)B„} + a(n- 2){a(n- 3)C„ + a{n- 3)D n }] 
+ a{n - \)[a(n - 2){a(n - 3)E„ + a(n - 3)F n } + a(n - 2){a(n - 3)G„ + a(n - 3)H„ }] 

b 
O 
m 
m 



■ 

' 

m 
o 

m a(n) = a{n- 2){A n a(n- 3){a{n- 4)A n _ x + a{n- 4)5 B _ 1 } 

+ B n a(n- 3){a(n- 4)C„_, + a{n- 4)D„_ 1 }} 
+a{n- 2){C n a{n- 3){a(n- 4)£„_, + a(n-4)F n _ x } 

+ D n a(n- 3){a(n- 4)G n _ x + a{n- 4)//„_ 1 }} 
+a(n- 2)[E n a(n- 3){a{n- 4)A n _ x + a(n-4)B n _ x ] 

+ F n a(n- 3){a{n- 4)C n _ x + a{n- 4)D n _ x }} 
+a(n- 2){G n a{n- 3){a(n- 4)E n _ x + a{n- 4)F„_,} 

+ H n a(n- 3){a(n- 4)G n _ x + a{n- 4)H n _ x }) 
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9 
I 

: 

m 

O 



r(n) = [A„(A,-A n - 4 ) + 5 B .,a(n- 4))a(«- 3)+ B„(c„_ ia (n- 4)+ D n _ x a{n- 4))a{n- 3)]a(n- 2) 
C.^afo- 4)+ 4))a(«- 3)+ D^G^n- 4)+ H n _p{n- 4))a{n- 3)]a(«- 2) 

£ n (Z„_,a(«- 4)+ B n _,a(n- 4))a(n- 3)+ F n {f„_,a{n- 4)+ D n _ x a{n- 4))a{n- 3)]a(n- 2) 
'G n (K-A"- 4)+F n _ l a(n- 4))a(n- 3)+ H n (G n ^a{n- 4)+H„_ l a(n- 4))a(n- 3)]a(n- 2) 
(U4, + ^4,-,)«(«- 4) + (4A-i + ^5„_ 5 )fl(«- 4}a(«- 3) 
+ {(5„C„_, + F„C„„,)a(«- 4)+ (£„£>„_, + F n D„.,)fl(«- 4)}a(«- 3) J 
{(C„£ K _, + G„£„_,)a(«- 4)+ (c„F„_, + G n F n _^ja{n- 4\a{n- 3) 
+ {(Z)„G„_, + H n G n ^a{n- 4) + (/>„#„_, + H„H n _)a{n- 4)}a(«- 3) J 



a(») = 



«(«-2) 



a(«-2) 



ru 



a 

n 



[0098] In order to further illustrate the invention, a method embodiment will now 

be described. 

[0099] FIG. 22 illustrates a flowchart of the steps of a method 2200 for pipelining 

multiplexer loops according to an embodiment invention. As described herein, 
pipelined multiplexer loops according to the invention can be used to form part 
of an integrated circuit. 

[0100] Method 2200 has three steps 2202, 2204, 2206. Each of these steps will 

now be described and illustrated with an example application. 

[0101] In step 2202, a number of input values is selected. The selected number 

of inputs is provided to a pipelined multiplexer loop according to the invention 
during a clock period of operation of an integrated circuit. The number of 
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selected inputs can be used to identify a particular multiplexer loop that is to be 

modified in accordance with the invention. 
[0102] In step 2204, a number of look-ahead steps is selected. The number of 

look-ahead steps is independent of the number of input values selected in step 

2202. The selected level of look-ahead is implemented as a part of the pipelined 

multiplexer loop according to the invention. 
[0103] In step 2206, a pipelined multiplexer loop according to the invention is 

implemented using, for example, a backplane or an optical/fiber technology. The 
y, pipelined multiplexer loop is implemented using at least one digital logic circuit 

according to the invention. The pipelined multiplexer is also implemented so that 

US 

i/5 it has the number of look-ahead steps selected in step 2204. 

HI 

yj [0 1 04] As described herein, in an embodiment, the digital logic circuit according 

to the invention has an ft-level look-ahead network that converts the number of 
^ input values selected in step 2202 to a plurality of intermediate values, wherein 

U n represents the number of look-ahead steps selected in step 2204. The digital 

hi 

logic circuit is formed from a plurality of multiplexers each having a first and a 

'* second input port, an output port, and a control port. At least some of these 

fll 

multiplexers are arranged to form the pipelined multiplexer loop. The pipelined 
multiplexer loop has at least a first and a second stage. The first stage consists 
of a first multiplexer. The second stage consists of a second and a third 
multiplexer. A first communications link couples the output port of the second 
multiplexer to the first input port of the first multiplexer. A second 
communications link couples the output port of the third multiplexer to the 
second input port of the first multiplexer. A first feedback loop, having a first 
delay time, couples the output port of the first multiplexer to the control port of 
the first multiplexer. A second feedback loop, having a second delay time, 
couples the output port of the first multiplexer to the control ports of the second 
and third multiplexers. The first delay time is an integer multiple of the second 
delay time and is equal to (n+1) times a clock period of operation of the 
integrated circuit. 
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[0105] As described herein, method 2200 can be used to design or improve the 

performance of a wide variety of circuits. FIGs. 23-28 illustrate how method 
2200 is applied to design and/or improve the performance of a DFE. 
[0106] FIG. 23 illustrates a serial representation of an example of a circuit that 

can be used as part of a digital communications system to remove inter-symbol 
interference (i.e., a DFE). The DFE is formed using an 8-to-l multiplexer loop 
2302 (similar to the 8-to-l multiplexer loop 2000 described above) and several 
comparators 2308. The multiplexers 2304 of the multiplexer loop are similar to 
M ! those described above, each having an expected computation time of 0.2 ns. The 

o 

□ multiplexer loop has three delays 2306a, 2306b, 2306c. 

% [0107] The DFE circuit of FIG. 23 can be determined after selecting, in step 2202 

y i 

© of method 2200, the number of input values that to be provided to a pipelined 

U 

Q multiplexer loop during a clock period of operation of an integrated circuit. As 

JU shown in FIG 23., eight values are input to multiplexer loop 2302. These input 

H' values are the outputs of the eight comparators 2308. The comparators 2308 

ru 

m compare an input signal y n to eight possible feedback signals f 0 . . . f 7 . The eight 

feedback signals are given by the following equations: 



fo = 


-c 3 


-c 2 -c x 


A = 


-c 3 


-c 2 + c, 


fi = 


-c 3 


+ c 2 -c, 


A = 


-c 3 


+ C 2 + C \ 


/* = 


+c 3 


-c 2 -c { 


fs = 


+c 3 


-c 2 + c x 


fs = 


+c 3 


+ c 2 - Cl 


f 7 = 


+c 3 


+ c 2 + c 1 
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where c 3 , c 2 , and Cj are the three tap coefficients of the DFE. As described above, 
the present invention can be applied to multiplexer loop 2302. 
[0108] FIG. 24 illustrates an embodiment of the 3 -tap DFE of FIG. 23 having 

pipelining and 2-levels of look-ahead in accordance with the invention. The 
circuit of FIG. 24 can be formed from the circuit of FIG. 23 by replacing delay 
2306a with a delay 2402, and by adding 2-level look-ahead network 2404. 
[0109] As can be seen in FIG. 24, the delay 2402 forms a part of every loop of the 

multiplexer loop. In addition, it can be seen that no additional multiplexers were 
H* added to the loops of the multiplexer loop. Furthermore, 2-level look-ahead 

network 2404 has been placed before each of the multiplexers that form part of 

Hi the multiplexer loop shown in FIG. 23. 

II] 

: U [01 10] FIG. 25 illustrates a detailed view of 2-level look-ahead network 2404. 

u, 

p As shown in FIG. 25, 2-level look-ahead network 2404 is formed using 

p multiplexers and delays. 2-level look-ahead network 2404 transforms eight input 

N> values A n , B n , C n , D n , E n , F n , G n , and H n , into eight intermediate values O , , 0 2 , 0 3 , 

at 

■;. 0 4 , 0 5 , O e , 0 7 , and 0 8 . As will be understood by persons skilled in the relevant 

arts, other circuits can be used to implement a 2-level look-ahead network. 
[0111] As described herein, circuit according to the invention can be used to form 

part of a larger integrated circuit. In embodiments of the invention, circuits 
according to the invention are combined with comparator circuits to form an 
integrated circuit. 

[0112] FIG. 26 illustrates a 4-unfolded comparator circuit 2600 with f r latch and 

pipeline-registers. Circuit 2600 is formed using comparators 2602, data flip-flops 
(DFF) 2604, and latches (LAT) 2608. Circuit 2600 can be used, for example, 
with a 4-unfolded and retimed circuit formed from the circuit of FIG. 24. The 
circuit of FIG. 24 can be unfolded and retimed in a manner similar to that 
described above for the circuit of FIG. 16. 

[0113] FIG. 27 illustrates a 6-bit compare circuit 2700. Circuit 2700 can be 

modified, when required, to form an n-bit compare circuit. The operation of 
circuit 2700 is described by the following equations: 
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z^a^b, if a^fy 
s t = afy s 5 = a 5 b 5 
s . = l => a. >- b t 



[0114] FIG. 28 illustrates a serializer/deserializer that makes use of a 4-tap 

decision feedback equalizer integrated circuit 2800 according to an embodiment 

r* 

O of the invention. Circuit 2800 is implemented using circuits similar to those 

t: 

described herein. Integrated circuit 2800 is illustrative and not intended to limit 
the invention. 

jf [0115] As will be understood by persons skilled in the relevant arts given the 

P description herein, circuits having additional unfolding such as, for example, 8- 

r? unfolded circuits or 1 6-unfolded circuits can also be implemented in accordance 

W with the invention. These circuits exhibit the features of the invention and enable 

00 

Q high data rate digital communications systems to be built. Using the invention 

; u and various degrees of unfolding, it is possible to build circuits according to the 

invention that operate, for example, at data rates in excess of 3 Gb/s, 5 Gb/s, and 

lOGb/s. 

[01 16] As described herein, the invention can be used in a wide variety of digital 

circuits to improve performance. For example, in embodiments, the invention is 
used to improve the performance of computer processing systems having one or 
more nested feedback loops or multiplexer loops. Computer processing systems 
typically include microprocessors or microcontrollers having one or more 
instruction decoders, arithmetic logic units and/or other specialized circuits that 
contain multiplexers in a feedback loop. These feedback loops limit operating 
speed or processing speed. As described herein, the invention can be used to 
improve the operating speed or processing speed of such circuits, and thereby 
improve system performance . Other types of digital circuits that can benefit from 
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the invention will become apparent to persons skilled in the relevant arts given 
the description herein. 

CONCLUSION 

[0117] Various embodiments of the present invention have been described above, 

which are independent of the size of the multiplexer loop and/or steps of look- 
ahead used. These various embodiments can be implemented, for example, in 
N 5 optical/fiber, backplane, twisted-pair, and coaxial cable transceivers. These 

; various embodiments can also be implemented in systems other than 

communications systems. It should be understood that these embodiments have 
yjj been presented by way of example only, and not limitation. It will be understood 

O by those skilled in the relevant art that various changes in form and details of the 

^ embodiments described above may be made without departing from the spirit and 

N= scope of the present invention as defined in the claims. Thus, the breadth and 

fy 

m scope of the present invention should not be limited by any of the above- 

described exemplary embodiments, but should be defined only in accordance with 
the following claims and their equivalents. 
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